Answer the following questions and complete the exercises in RMarkdown. Please embed all of your code and push your final work to your repository. Your final lab report should be organized, clean, and run free from errors. Remember, you must remove the # for the included code chunks to run. Be sure to add your name to the author header above. For any included plots, make sure they are clearly labeled. You are free to use any plot type that you feel best communicates the results of your analysis.
Make sure to use the formatting conventions of RMarkdown to make your report neat and clean!
library(qtl)
library(qtlcharts)
library(tidyverse)
library(ggmap)
1. We have a satellite collars on a number of different individuals and want to be able to quickly look at all of their recent movements at once. Please load all the data files from us_individual_collar_data and use for loop to create plots for all different individuals of the path they move on longitude and latitude.
data_files <- list.files("data/us_individual_collar_data", full.names = TRUE)
data_files
## [1] "data/us_individual_collar_data/collar-data-A1-2016-02-26.txt"
## [2] "data/us_individual_collar_data/collar-data-B2-2016-02-26.txt"
## [3] "data/us_individual_collar_data/collar-data-C3-2016-02-26.txt"
## [4] "data/us_individual_collar_data/collar-data-D4-2016-02-26.txt"
## [5] "data/us_individual_collar_data/collar-data-E5-2016-02-26.txt"
## [6] "data/us_individual_collar_data/collar-data-F6-2016-02-26.txt"
## [7] "data/us_individual_collar_data/collar-data-G7-2016-02-26.txt"
## [8] "data/us_individual_collar_data/collar-data-H8-2016-02-26.txt"
## [9] "data/us_individual_collar_data/collar-data-I9-2016-02-26.txt"
## [10] "data/us_individual_collar_data/collar-data-J10-2016-02-26.txt"
for (i in 1: length(data_files)){
us_individual <- as.data.frame(read_csv(data_files[i]))
print(
ggplot(us_individual, aes(x=long, y=lat))+
geom_path() + geom_point() +
labs(title= data_files[i], x="Longitude", y="Latitude")
)
}
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
2. Please load all the data files from us_individual_collar_data and combine all data into one data frame. Create a summary to show what is the maximum and minimum of recorded data points on longitude and latitude.
us_list <- lapply(data_files, read_csv)
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
names_list <- strsplit(data_files, split=".txt")
names_vec <- unlist(names_list)
names(us_list) <- names_vec
us_all <- bind_rows(us_list)
us_all %>%
summarise("Maximum Latitude"=max(lat),
"Minimum Latitude"=min(lat),
"Maximum Longitude"=max(long),
"Minimum Longitude"=min(long))
## # A tibble: 1 x 4
## `Maximum Latitude` `Minimum Latitude` `Maximum Longitude` `Minimum Longitude`
## <dbl> <dbl> <dbl> <dbl>
## 1 41.6 26.6 -106. -123.
3. Use the range of the latitude and longitude from Q2 to build an appropriate bounding box for your map and load a map from stamen in a terrain style projection and display the map. Then, build a final map that overlays the recorded path from Q1.
lat <- c(26.6, 41.6)
long <- c(-122.6, -106.3)
bbox <- make_bbox(long, lat, f=0.05)
map <- get_map(bbox, maptype="terrain", source = "stamen")
## Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.
for (i in 1: length(data_files)){
us_individual <- as.data.frame(read_csv(data_files[i]))
print(
ggmap(map)+
geom_path(data=us_individual, aes(long,lat)) + geom_point(data=us_individual, aes(long,lat)) +
labs(title= data_files[i], x="Longitude", y="Latitude")
)
}
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## date = col_date(format = ""),
## collar = col_character(),
## time = col_datetime(format = ""),
## lat = col_double(),
## long = col_double()
## )
We will use the data from an experiment on hypertension in the mouse Sugiyama et al., Genomics 71:70-77, 2001
#?hyper
data(hyper)
4. Create a summary of the hypertension data. How many individuals and phenotypes are included in this data set? How many gene markers and chromosomes are included in this data set? Please create a table to show the number of markers on each chromosome.
nmar(hyper)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X
## 22 8 6 20 14 11 7 6 5 5 14 5 5 5 11 6 12 4 4 4
summary(hyper)
## Backcross
##
## No. individuals: 250
##
## No. phenotypes: 2
## Percent phenotyped: 100 100
##
## No. chromosomes: 20
## Autosomes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
## X chr: X
##
## Total markers: 174
## No. markers: 22 8 6 20 14 11 7 6 5 5 14 5 5 5 11 6 12 4 4 4
## Percent genotyped: 47.7
## Genotypes (%):
## Autosomes: BB:50.1 BA:49.9
## X chromosome: BY:53.0 AY:47.0
5. Please make an interactive genetic map of markers for the hypertension data.
iplotMap(hyper)
## Set screen size to height=700 x width=1000
6. Make a plot shows the pattern of missing genotype data in the hypertension dataset. Please reorder the recorded individuals according to their blood pressure phenotypes. Is there a specific pattern of missing genotype? Please explain it.
plotMissing(hyper, main="")
plotMissing(hyper, main="", reorder=1)
The missing genotype is in the middle 150 individuals. We can see this when we reorder by phenotype.
7. Based on your answer from previous question, you probably noticed that there are gene markers without data. Please use the function drop.nullmarkers to remove markers that have no genotype data. After this, make a new summary to show the number of markers on each chromosome. How many gene markers were dropped? Where were the dropped markers located? Please use the data without nullmarkers for the following questions.
hyper <- drop.nullmarkers(hyper)
summary(hyper)
## Backcross
##
## No. individuals: 250
##
## No. phenotypes: 2
## Percent phenotyped: 100 100
##
## No. chromosomes: 20
## Autosomes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
## X chr: X
##
## Total markers: 173
## No. markers: 22 8 6 20 14 11 7 6 5 5 14 5 5 4 11 6 12 4 4 4
## Percent genotyped: 48
## Genotypes (%):
## Autosomes: BB:50.1 BA:49.9
## X chromosome: BY:53.0 AY:47.0
On the 14th chromosome, 1 marker was dropped.
8. Please conduct single-QTL analysis and create a table to give the maximum LOD score on each chromosome based on their blood pressure phenotypes. Which gene marker has the higiest LOD score? Which chromosome contains the gene marker that has the highest LOD score? Then, creates an interactive chart with LOD curves from a genome scan for all chromosomes.
hyper <- calc.genoprob(hyper, step=1)
lod_score <- scanone(hyper)
summary(lod_score) %>%
arrange(desc(lod))
## chr pos lod
## D4Mit164 4 29.5 8.094
## c1.loc45 1 48.3 3.529
## c6.loc23 6 23.0 1.862
## c15.loc14 15 19.5 1.730
## c2.loc45 2 52.7 1.612
## c5.loc68 5 68.0 1.554
## cX.loc38 X 39.1 0.998
## D19Mit59 19 0.0 0.792
## D8Mit271 8 59.0 0.791
## c3.loc33 3 35.2 0.784
## D9Mit18 9 68.9 0.750
## c11.loc36 11 38.2 0.668
## D18Mit17 18 14.2 0.506
## D12Mit37 12 1.1 0.429
## D7Mit297 7 26.2 0.400
## D16Mit70 16 51.4 0.370
## D13Mit78 13 59.0 0.313
## c10.loc8 10 10.2 0.261
## D17Mit46 17 3.3 0.207
## D14Mit7 14 52.5 0.106
iplotScanone(lod_score)
Chromosome 4 has the highest LOD score. The gene that has this highest score is D4Mit164.
9. Based on your genome scan results, create a table which only includes those chromosomes with LOD > 1. Creates an interactive chart with LOD curves linked to estimated QTL effects for only those chromosomes with LOD > 1. Find the gene maker with the highest LOD score and describe how does the genotype of this marker influence the individual’s phenotype.
summary(lod_score, threshold=1)
## chr pos lod
## c1.loc45 1 48.3 3.53
## c2.loc45 2 52.7 1.61
## D4Mit164 4 29.5 8.09
## c5.loc68 5 68.0 1.55
## c6.loc23 6 23.0 1.86
## c15.loc14 15 19.5 1.73
iplotLOD <- iplotScanone(lod_score, hyper, chr=c(1,2,4,5,6,15))
iplotLOD
The gene with the highest LOD score can tell us that those who are homozygous for that geno have higher blood pressure than those that are heterozygous.
10. Please save your interactive chart from Q9 as a html file hyper_iplotScanone.html and make sure your upload it to your github repository with your lab14 homework as well.
htmlwidgets::saveWidget(iplotLOD, file="hyper_iplotScanone.html")
Please be sure that you check the keep md file in the knit preferences.